gSpan: Graph-Based Substructure Pattern Mining
نویسندگان
چکیده
We investigate new approaches for frequent graph-based pattern mining in graph datasets and propose a novel algorithm called gSpan (graph-based Substructure pattern mining), which discovers frequent substructures without candidate generation. gSpan builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order, gSpan adopts the depth-first search strategy to mine frequent connected subgraphs efficiently. Our performance study shows that gSpan substantially outperforms previous algorithms, sometimes by an order of magnitude.
منابع مشابه
Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism
CUDA is an advanced massively parallel computing platform that can provide high performance computing power at much more affordable cost. In this paper, we present a parallel graph-based substructure pattern mining algorithm using CUDA Dynamic Parallelism. The key contribution is a parallel solution to traversing the DFS (Depth First Search) code tree. Furthermore, we implement a parallel frequ...
متن کاملOn the discovery of group-consistent graph substructure patterns from brain networks.
Complex networks constitute a recurring issue in the analysis of neuroimaging data. Recently, network motifs have been identified as patterns of interconnections since they appear in a significantly higher number than in randomized networks, in a given ensemble of anatomical or functional connectivity graphs. The current approach for detecting and enumerating motifs in brain networks requires a...
متن کاملFrequent Sub-graph Mining on Edge Weighted Graphs
Frequent sub-graph mining entails two significant overheads. The first is concerned with candidate set generation. The second with isomorphism checking. These are also issues with respect to other forms of frequent pattern mining but are exacerbated in the context of frequent sub-graph mining. To reduced the search space, and address these twin overheads, a weighted approach to sub-graph mining...
متن کاملOptimizing gSpan for Molecular Datasets
We propose two optimizations for mining molecular databases with gSpan, one of the state-of-the-art graph mining algorithms. Both optimizations apply to the enumeration of subgraph occurrences in a graph database, which is, also according to our profiling, the most expensive operation of gSpan. The first optimization reduces the number of subgraph isomorphisms that need to be accessed for prope...
متن کاملOn Canonical Forms for Frequent Graph Mining
In approaches to frequent graph mining that are based on growing subgraphs into a set of graphs, one of the core problems is how to avoid redundant search. A powerful technique to overcome this problem is a canonical description of a graph, which uniquely identifies it, and a corresponding test. This paper introduces a family of canonical forms that are based on systematic ways to construct spa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002